Similarity Based Feature Weighting for Inter Domain Classification of Text
نویسندگان
چکیده
منابع مشابه
Title Similarity-Based Feature Weighting for Text Categorization
In automated text categorization, a system analyzes a natural-language document to decide whether it belongs in one or more of a group of pre-defined categories. The typical approach is to represent the documents using feature vectors, and inductively generate a classifier based on a training set of documents and their manually-assigned categories. Such a process ignores information on word ord...
متن کاملAn Improved Feature Weighting Method for Text Classification
Feature extraction is the important prerequisite of classifying text effectively and automatically. TF· IDF is widely used to express the text feature weight. But it has some problems. TF•IDF can’t reflect the distribution of terms in the text, and then can’t reflect the importance degree and the difference between categories. This paper proposes a new feature weighting method—TF•IDF•Ci to whic...
متن کاملPoisson naive Bayes for text classification with feature weighting
In this paper, we investigate the use of multivariate Poisson model and feature weighting to learn naive Bayes text classifier. Our new naive Bayes text classification model assumes that a document is generated by a multivariate Poisson model while the previous works consider a document as a vector of binary term features based on the presence or absence of each term. We also explore the use of...
متن کاملBi-Weighting Domain Adaptation for Cross-Language Text Classification
Text classification is widely used in many realworld applications. To obtain satisfied classification performance, most traditional data mining methods require lots of labeled data, which can be costly in terms of both time and human efforts. In reality, there are plenty of such resources in English since it has the largest population in the Internet world, which is not true in many other langu...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES
سال: 2018
ISSN: 0973-8975,2454-7190
DOI: 10.26782/jmcms.2018.10.00014